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Abstract 

This  paper  analyzes  autoregressive  time  series  models  where  the  errors  are  assumed 
to  be  martingale  difference  sequences  with  weak  additional  restrictions  on  their  distribu- 
tion. Under  these  conditions  Quasi  Maximum  Likelihood  estimators  of  the  autoregressive 
parameters  are  no  longer  efficient  in  the  GMM  sense.  The  main  result  of  the  paper  is 
the  construction  of  efficient  semiparametric  instrumental  variables  estimators  for  the  au- 
toregressive parameters.  The  optimal  instruments  are  a  linear  function  of  the  innovation 
sequence. 

It  is  shown  that  a  frequency  domain  approximation  of  the  optimal  instruments  leads  to 
an  estimator  which  only  depends  on  the  data  periodogram  and  an  unknown  linear  filter. 
Semiparametric  methods  to  estimate  the  optimal  ffiter  are  proposed. 

The  procedure  is  equi\/alent  to  GMM  estimators  where  lagged  observations  are  used 
as  instruments.  The  number  of  instruments  is  allowed  to  grow  at  the  same  rate  as  the 
sample.  No  lag  truncation  parameters  are  needed  to  implement  the  estimator  which  makes 
it  particularly  appealing  from  an  applied  point  of  view. 


1.  Introduction 

This  paper  develops  new  instrumental  variables  (IV)  estimators  for  autoregressive  time 
series  models  when  the  errors  are  uncorrelated  but  not  independent.  The  specification 
includes  error  processes  which  are  conditionally  heteroskedastic  of  unknown  functional 
form.  Efficiency  gains  are  obtained  without  having  to  specify  a  model  for  the  dependence  in 
the  errors.  The  setup  is  general  enough  to  account  for  stylized  facts  in  many  economic  time 
series  displaying  features  such  as  thick  tailed  distributions  and  time  dependent  conditional 
variances. 

Classical  efficiency  results  for  the  quasi  maximum  likelihood  estimator  (QMLE)  of  the 
autoregressive  parameters  such  as  Hannan  (1973)  depend  on  independence  of  the  errors. 
In  the  more  general  case  of  conditional  heteroskedasticity  considered  here  the  QMLE  does 
no  longer  attain  a  GMM  lowerbound  for  the  asymptotic  covariance  matrix  which  now 
depends  on  fourth  moments  of  the  innovation  process.  In  Kuersteiner  (1997)  it  is  shown 
how  a  decomposition  of  the  higher  moment  terms  leads  to  a  lower  bound  for  the  covariance 
matrix.  An  instrumental  variables  estimator  based  on  this  decomposition  is  shown  to 
achieve  the  lower  bound  for  the  covariance  matrix  in  the  class  of  IV  estimators  with 
instruments  that  are  linear  in  the  innovations. 

The  feasible  GMM  estimators  developed  in  this  paper  are  similar  to  the  estimators 
of  Hayashi  and  Sims  (1983),  Stoica,  Soderstrom  and  Friedlander  (1985)  and  Hansen  and 
Singleton  (1991,  1996).  In  this  literature  lagged  observations  are  used  as  instruments  to 
account  for  unmodelled  MA(q)  innovations  which  lead  to  inconsistent  OLS  estimators. 
Here  lagged  observations  are  used  as  instruments  to  account  for  unmodelled  conditional 
heteroskedasticity  of  the  error  terms  which  renders  OLS  inefficient. 

Apart  form  the  different  motivation  for  the  use  of  instruments,  inefficiency  versus  incon- 
sistency, this  paper  extends  the  previous  literature  as  it  explicitly  treats  feasible  estimation. 
The  number  of  instruments  in  our  case  is  allowed  to  grow  at  the  same  rate  as  the  sample 
size.  This  is  made  possible  at  the  cost  of  an  additional  restriction  on  the  fourth  order  cu- 
mulants  compared  to  the  treatment  in  Kuersteiner  (1997).  The  advantage  of  making  this 
assumption  is  that  the  estimator  can  be  implemented  without  the  need  for  a  truncation 
or  bandwidth  parameter  for  the  number  of  instruments  used. 

Since  the  optimal  instruments  are  unobservable  they  need  to  be  estimated  nonpara- 
metrically.  Assumptions  about  the  generating  mechanism  of  the  volatility  process  or  more 
generally  the  dependence  in  higher  moments  are  replaced  by  smoothness  assumptions  for 
higher  order  cumulant  spectra  of  the  errors.  This  setup  allows  for  the  treatment  of  de- 
pendence in  higher  moments  as  a  nuisance  parameter.  Nonparametric  estimators  of  this 
nuisance  parameter  are  used  to  construct  the  optimal  instruments. 

Other  semiparametric  procedures  proposed  to  handle  conditional  heteroskedasticity 
include  Robinson  (1987)  and  Newey  (1991).  No  parametric  assumptions  about  the  form 
of  conditional  heteroskedasticity  are  made  in  these  treatments.  However,  in  order  to  es- 
timate the  conditional  variance  these  authors  have  to  assume  independent  errors.  This 
assumption  has  precluded  direct  application  of  their  techniques  to  the  stochastic  condi- 
tional variance  case.  Hidalgo  (1992)  relaxes  the  iid  assumption  for  the  errors  but  has 
to  assume  instead  that  the  conditional  variance  is  a  smooth  function  of  an  independent 


stationary  process.  Hansen  (1995)  treats  the  stochastic  volatiHty  model  in  a  semipara- 
metric  GLS  framework.  He  assumes  that  the  conditional  variance  process  converges  to  a 
Brownian  motion  in  the  limit.  Sample  path  continuity  of  the  limit  process  then  allows  for 
consistent  kernel  estimation  of  the  conditional  variance. 

More  generally  Hansen  (1985)  and  Hansen,  Heaton  and  Ogaki  (1988)  prove  existance 
of  instrumental  variable  estimators  achieving  a  GMM  lowerbound  in  the  presence  of  con- 
ditional heteroskedasticity.  The  high  dimensional  and  nonlinear  character  of  these  instru- 
ments has  so  far  precluded  implementation  of  such  an  estimator. 

Here  we  limit  ourselves  to  the  implementation  of  optimal  procedures  in  the  much 
smaller  class  of  IV  estimators  with  instruments  that  are  linear  functions  of  the  observa- 
tions. Ordinary  least  squares  is  a  particular  member  of  this  class.  In  the  general  case  of 
conditional  heteroskedasticity  it  is  inefficient.  This  paper  achieves  the  construction  of  a 
feasible  version  of  the  most  efficient  IV  estimator  with  hnear  instruments. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  describes  the  model  as- 
sumptions. Section  3  develops  the  efficient  IV  estimator  for  the  AR{p)  model.  A  frequency 
domain  approximation  for  the  IV  estimator  is  derived  in  Section  4.  Section  5  shows  that  a 
semiparametric  estimator  with  the  same  optimality  properties  can  be  constructed.  Some 
Monte  Carlo  simulations  are  reported  in  Secition  6  and  concluding  remarks  are  made  in 
Section  7.  Some  important  lemmas  are  quoted  in  Appendix  A  while  the  main  results  of 
the  paper  are  proved  in  Appendix  B. 

2.  Model 

We  start  by  defining  the  stochastic  environment  of  the  model.  Let  {Q  ,J^  ,P)  he  a.  general 
probability  space  and  define  a  filtration  J^t  to  be  an  increasing  sequence  of  cr-fields  such 
that  J^t  ^  -^t+i  C  ^  V  i.  There  is  a  doubly  infinite  sequence  of  random  variables  generating 
the  filtration  Tf  We  assume  that  we  observe  a  sample  of  size  n  of  a  univariate  time  series 
yt  where  i  =  {1,  ...,n}.  More  specifically,  we  assume  that  yt  is  generated  by  the  following 
autoregressive  model 

<l^{L)yt  =  St  (1) 

where  et  is  a  martingale  difference  sequence  generating  Tt-  Here  (f){L)  =  l  —  (^iL—...  —  (f)pU'. 
(p'  =  (01, ...,  (f)p)  is  the  vector  of  parameters  describing  the  mean  equation  of  the  model.  It 
is  assumed  that  (l){L)  has  all  roots  outside  the  unit  circle.  We  are  interested  in  estimating 
the  parameter  vector  (p.  The  martingale  difference  assumption  for  et  implies  absence  of 
correlation  between  the  errors.  However,  it  is  not  assumed  that  the  errors  are  independent. 
Rather  we  allow  for  dependence  in  higher  than  second  moments  to  account  for  thick  tails 
and  conditional  heteroskedasticity. 

Assumption  A-1.   (i)  et  is  strictly  stationary  and  ergodic,  E  {et  \  J^t-i)  =  0,E  [e^  |  J-'t-i)  - 
(T^,  E  {et)  =  cr^  <  oo,  (ii)  (f>{L)  has  all  roots  outside  the  unit  circle,  (Hi)  E{e'te't_s  "  ^'^)  = 
(J  (s)  <  DO  for  s  >  0,  (iv)  E  [etet-s^t-r)  =  0  for  s  ^  r,  s,r  >  0,  (v)  ^  \s\  \a  {s)\  =  B  <  oo, 
E  [e'te't^s)  >^  some  a>  0  for  all  s. 

Remark  1.  Assumption  (iv)  is  added  to  the  assumptions  in  Kuersteiner  (1997)  in  order 
to  simplify  the  form  of  the  optimal  instruments.  It  is  somewhat  restrictive  as  it  rules  out 


some  nonsymmetric  parametric  examples  such  as  EGARCH.  The  IV  estimators  proposed 
in  Section  3  are  still  consistent  and  asymptotically  normal  if  (iv)  fails.  However,  in  this 
case  they  lose  their  optimality  properties. 

Remark  2.  It  should  he  emphasized  that  no  assumptions  about  third  moments  are  made. 
In  particular  this  allows  for  skewness  in  the  error  process. 

Remark  3.  GARCH  (p,q)  processes  satisfy  Asssumption  (A-1)  under  certain  conditions. 
Nelson  (1990)  obtains  sufficient  conditions  for  stationarity  and  ergodicity  of  the  GARCH(1,1) 
model.  The  martingale  difference  property  follows  immediately  from  the  definition  of 
a  GARCH  process.  Assumption  (A-liv)  is  shown  to  hold  for  the  ARCH{p)  case  in 
Milhoj  (1985)  for  symmetric  innovation  densities.  The  same  argument  extends  to  the 
GARCH (p.q)  case.  If  the  innovation  distribution  is  normal  then  fourth  moments  are 
known  to  exist  for  the  GARCH (1, 1)  case  if  87!  +  2'^if3i  +  0\  <  1.  This  condition  is  valid 
for  P  =  0  and  thus  covers  the  ARCH  case.  In  Milhoj  (1 985)  and  BoUerslev  (1 986)  ,  the  auto- 
correlation structure  a  (s)  is  shown  to  be  identical  to  the  AR{p)  and  ARMA{maoc(p,  q),q) 
case  for  ARCH{p)  and  GARCH  {p,q)  respectively.  This  implies  that  the  summability 
condition  holds  if  fourth  moments  exist.  Similar  arguments  can  be  made  to  show  that 
stochastic  volatility  models  satisfy  the  assumptions. 

Based  on  the  results  in  Kuersteiner  (1997),  we  will  now  introduce  the  optimal  instru- 
mental variables  estimator  for  the  AR{p)  model.  The  estimator  is  constructed  by  reweight- 
ing  the  innovation  sequence  by  the  unconditional  fourth  moments  a  (k)  +  o"^  of  the  error 
process.  Without  parametric  assumptions  about  the  form  of  conditional  heterogeneity 
these  moments  typically  have  to  be  estimated. 

3.  Instrumental  Variables  Estimator 

The  parameter  vector  0  can  be  consistently  estimated  by  OLS.  Under  the  assumptions 
in  this  paper  OLS  amounts  to  an  inefficient  IV  estimator  in  the  class  of  IV  estimators 
with  linear  instruments.  Kuersteiner  (1997)  derives  the  form  of  the  optimal  instrument  as  a 
function  of  the  fourth  moments  and  the  impulse  response  function  of  the  underlying  process 
yt  in  a  more  general  context.  In  this  section  these  more  general  results  are  specialized  to 
the  autoregressive  model. 

Let  zt  G  M.P  be  J^t-i  measurable  and  square  integrable,  strictly  stationary  and  ergodic. 
Then,  the  instrument  satisfies  the  moment  condition  E  [(</>  (L)  yt)  zt]  =  0  where  (f>  (L)  yt  = 
Et-  Let  y't  =  [yt,yt-i,  ...,yt-p]  and  </>  =  [l,  —<p']  .  An  instrumental  variables  estimator  then 
is  defined  as 


f=P+i 
with  an  explicit  solution  in  the  case  of  an  autoregressive  model  as 


0  =arg  min 


<t>=[ZY.i]'^  Z'Y  (1) 


where  Y'  =  [yi+p, . . .  ,yn]  and  y!_-^  =  [yp,-.,yn-i]  where  y'j  =  [yt,yt-i,  ...,yt-p+i]  ■  Note 
that  Y  =  Y-icf)  +  e.  By  the  ergodic  theorem  it  follows  is  4>  is  consistent. 

As  argued  before  instruments  zt  which  are  linear  in  past  observations  are  considered. 
Such  instruments  are  equivalent  to  instruments  that  are  linear  functions  of  past  innova- 
tions. This  set  up  is  formalized  here.  Using  (f)~  (z)  =  X^jIq^^J-^'''  *^^  time  series  yt  can 
be  expressed  as  a  linear  filter  of  past  shocks  yt  =  Yl%.o''P(i>,j^t-j-  Also,  let 

ak  =  E  (e^tetk)  =  ^  (f^)  +  (^^  (2) 

and  b'f^  =  (V'</.,fe-i,  ■■■,'4'4,,k-p)  with  ip^^^  =  0  for  k  <  0.  Then  define  P^  =  [6i,  ...,6^]. 

In  Kuersteiner  (1997)  it  is  shown  that  for  Zf  =  Yll^=i  '^k^t-k  with  a^  G  W  such  that  for 
the  z-th  element  of  afc^  of  a^  Yl'kLi  Wk,i\  <  ooVi  the  asymptotic  distribution  of  0  is  normal 
with  mean  zero  and  covariance  matrix 

771 — KX) 

where  Am  =  [o-i,  ...,0^].  The  lower  bound  is 

S  =  \im{^' Pln^;n'Pm)-'  (3) 

where  it  should  be  noted  that  under  Assumption  (A-1),  Qrn  =  Dio-gioLi,  ...,0^)- 

It  then  follows  immediately  that  the  optimal  instrument  achieving  (3)  has  weights 
O'k  =  ^k/oi-k-  A  different  way  of  writing  the  optimal  instrument  is  zt  =  lim^^oo  Pm^m^T' 
almost  surely.  This  is  a  special  case  of  the  more  general  formulation  in  Kuersteiner  (1997). 

The  optimal  instrument  matrix  Z  is  now  defined  by  setting  z[  =  [zt,i,  ...zt-p+i^p]  with 
t    ■ 
Zt+i,k  =  Y^%zO'sr^^t-i  and  letting  Z'  =  [zp,  ...,Zn-i]  .  It  should  be  emphasized  that  the 

k  -th  instrument,  i.e.   the  instrument  for  parameter  (pj^  is  lagged  by  k  periods  and  has  a 

convolution  filter  which  is  also  shifted  by  k  elements. 

The  estimator  introduced  so  far  is  not  feasible  and  needs  to  be  approximated.  Let  ho 

be  a  function  of  the  seqeunce  {fcfc/ctfc}fc^i  of  optimal  weights  and  the  true  parameter  value 

(f)Q.  Define  zt  =  Yfk'Ji'^ h/otk^t-ki^o)  with  et{(f)o)  =  (p'oYt  and  let  Z  =  [zp,...,Zn-i]  ■  The 

approximate  version  of  4>  is  now 

^(/io)=(zV_i)"'zV.  (4) 

A  feasible  estimator  is  then  obtained  by  replacing  Hq  with  a  consistent  estimate  h.  The  proof 
of  feasibility  proceeds  in  two  steps.  In  Section  4  it  is  shown  that  y/n{(f){ho)  —  </>)  =  Op(l) 
and  in  Section  5  we  establish  that  ^yn{^{h)  —  4>{^o))  =  Op{l). 

In  the  next  two  sections  the  Approximation  (4)  is  represented  in  terms  of  frequency 
domain  integrals.  This  imposes  no  practical  limitations  since  the  freqeuncy  domain  version 
differs  from  (4)  by  the  exclusion  of  only  a  few  observations  at  the  beginning  of  the  sample. 
The  difference  is  asymptotically  of  order  Op{n~^)  and  therefore  does  not  affect  the  first 
order  asymptotic  properties  of  the  estimator. 

Formulating  the  estimator  in  the  frequency  domain  however  offers  the  potential  for  im- 
provements in  terms  of  computational  efficiency  since  the  frequency  domain  formulation 
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can  be  used  to  implement  FFT-algorithms.  Using  FFT  algorithms  reduces  the  compu- 
tational complexity  of  ^{h)  from  0{'n?)  to  0(n  log  n).  This  improvement  is  substantial 
in  applications  where  the  dataset  is  extremly  large  and  the  estimator  is  computed  many 
times.  Leading  examples  are  model  selection  procedures  in  forecasting  applications  and 
simulation  studies. 


4.  IV  Estimation  in  the  Frequency  Domciin 

In  this  section  a  frequency  domain  approximation  to  the  optimal  IV  estimator  is  derived. 
Consider  the  inverse  of  the  spectral  density  for  the  AR{p)  model  f~y  (A)  =  ^  |(?!)(e*^)| 

and  let  [(/((e*"^)]    =  9yy{^>  '^)-  The  definition  of  the  spectrum  of  the  squared  errors  will  be 
useful  later  and  is 

oo 

-i\k 


/,2,2  (A)  =  (27r)-i    J2   ^{k)e- 


(1) 


fe=— oo 


Define  the  lag  operator  a(A)  =  [e'^,  ....,6*^^]  and  denote  the  complex  conjugate  transpose 
by  a(A)*.  Also  introduce  the  matrix  A  (A)  =  a(A)*a  (A) .  Then  gyy{(t>,  A)  can  be  represented 
as 


9yy(.(t>A) 


Define 


77(0,  A)  = 


'  1  a(A) 

a{Xy    A{X) 

d\ngyy{^,\) 


-itX 


(2) 


and 


and  introduce  the  discrete  Fourier  transform  of  the  data  as  cOy  (A)  =  -y=  Y17=i  Vt^ 

the  periodogram  as  In,yy  (A)  =  1^^;^  (A)  |    . 

With  these  definitions  we  turn  to  the  frequency  domain  implementation  of  the  instru- 
mental variables  estimator  introduced  at  the  beginning.  It  is  easy  to  show  that 


4>  = 


1  -1 


Re  [In,zy  (A)  a  (A)]  d\ 


Re[In,zy{\)\dX  +  Op{n-^) 


(3) 


The  main  contribution  of  the  paper  consists  in  deriving  an  approximation  to  this  frequency 
domain  formulation  that  does  only  depend  on  observable  quantities.  In  a  first  step  we 
decompose  the  instrument  into  observable  data  and  an  unobservable  optimal  filter.  In  the 
next  section  it  is  shown  how  the  unknown  optimal  filter  can  be  consistently  estimated. 

For  the  purpose  of  this  and  the  next  section,  we  introduce  the  spaces  L'^  [— 7r,7r]  of 
functions  /  :  [— 7r,7r]  — >  C^  such  that  J"  |/|  d\  <  oo.  Also,  define  the  spaces  C^  [— 7r,7r]  of 
functions  /  :  [— 7r,7r]  — >  W  such  that  f  is  k  times  continuously  differentiable.  Throughout, 
the  function  Rn  (A)  will  denote  a  generic  remainder  term  whose  definition  can  change.  We 
start  by  approximating  the  discrete  Fourier  transform  of  the  instrument  variables  ztj 


^.,  (A)  =  l^,k  (A)  e^^V(e-^^)a;,  (A)  +  i?^,^  (A) 


(4) 


'A,    _-i 


where  /^.,  (A)  =  ^-„  ^e-^>0+^)  and^i?^^^)  ^  „-i/2  ^oo^  _^^-,A;^^^^.  ^^^  _  j^^^ 
define  the  sequences  ^  =  {ipQj'ipi,  ■■■}  and  ^{k)  =  {i/jQ/akjipi/ai+k,  ■■■}■  Then  let  the  inner 


product  Rl^^s  (A)  =  -^  Yl'^o  ^J^  ^^^^n,j  (A)  be  defined  for  a  sequence  s  =  {sq,  si, ...}  where 

Unj  (A)  =  Ylt=i-j  ^te"^^  —  Y17=i  ^te^^^-  We  can  now  write  the  remainder  term  R^  f.  (A)  in 
a  compact  form 


<fc(A)  =  -W(A)0(e- 


-iX 


)K,i,W  +  RriMk)^>')- 


We  denote  the  vectors  of  stacked  error  terms  by  -R^  (A)   =    [i?^  ^  (A) ,  ...i?^p  (A)]    and 

K,i,^^^  =  [^n,^(i)^^)'  •■•'  Kmp)^^^^''  ^^  ^^  analogous  way  u^^  (-A)  is  the  complex  conju- 
gate of  uJzi^  (A) .  Define  l.^  (A)  as 


l^  (A)  = 


^^,1  (A) 


't/i.p  (A) 


(5) 


The  properties  of  /,/,  (A)  determine  the  asymptotic  distribution  of  the  instrumental  vari- 
ables estimator.  The  next  lemma  gives  a  representation  of  /,/,  (A)  in  terms  of  convolution 
operators.  This  shows  that  the  smoothness  of  l^  (A)  is  inherited  from  the  smoothness  of 
the  AR(p)  spectrum. 

Lemma  4.1.  Let  l^^k  (A)  =  E^o  ^1^7^"'^^^'^''^  ^^'^^  otk  =  E  (e?e?_fc)  ,  et  satisfying  As- 
sumption (A-1)  and  ipj  being  the  coefficients  of  the  power  series  expansion  of  4>{z)~  .  Also 
assume  that  (f){z)  =  1  —  ^  a{z)*  has  all  characteristic  roots  outside  the  unit  circle.  Then 
Irj  (A)  =  l^  (A)  -t- 1^  (—A)  can  be  represented  as 

J  —IT  ^ 

where  fa  (A)  =  ZT=-oo  ^j^''^'  ^^^h  a,  =  (^  -  ^)  . 

Proof.  See  Appendix  B  ■ 

Remark  4.  Using  the  convolution  operator  *  a  compact  notation  for  l^i  (A)  is  Ij^  (A)  = 
{fa  *  v)  (A)  -I-  -^r)  {(f),  A) .  The  properties  of  Irj  (A)  can  now  be  determined  from  those  of  fa 
and  fj.  For  fa  &  L}  [— 7r,7r]  and  rj  E  C^  [—-n,-n]  it  follows  from  FoUand  (1984),  Theorem 


8.10,  that  Ir,  (A)  G  C''  [-7r,7r]  implying  that  J] 


oo 
j=0 


«fc+j 


\j\  <  OO.  While  fa  e  L^  [-7r,7r]  is 


sufRcient  to  obtain  this  result  it  is  not  necessary.   Alternatively  if  a^  >  a  >  0  for  some 

\j\  <  a-'^T=okj\\j\  <oo 


oik+j 


a  and  all  k  then  sup  |  a^.    |   <  a   ^   <  oo  and  Yl'^o 

if  f]  E  C"^  [— 7r,7r]  .  These  arguments  show  that  Irj  (A)  is  sufficiently  smooth  to  apply  the 
central  limit  theorem  in  Appendix  A. 

The  representation  of  the  discrete  Fourier  transforms  of  the  instruments  in  terms  of  the 
DFT  for  the  data  allows  to  obtain  a  frequency  domain  version  of  4>  without  the  need  to  go 
through  an  explicit  calculation  of  the  instruments  in  the  time  domain.  The  approximation 
relies  on  the  fact  that  convolutions  in  the  time  domain  are  transformed  into  multiplications 
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in  the  frequency  domain  and  the  fact  that  the  residuals  can  be  computed  by  a  simple 
multiphcation  of  ujy  (A)  by  (f){e'^^). 

The  discrete  Fourier  transform  of  the  instrument  matrix  is  then  obtained  from  (4) 
by  lagging  each  lOz,,  W  by  k  periods  leading  to  e~*^'^w^^  (A)  and  stacking  the  resulting 
transforms  in  the  following  way 


u, (A) = 


-iX 


^z,  (A) 


,-tAp 


'^z,  (A) 


cl^{e-'^)uy  (A)  l^  (A)  +  Rn{X), 


(6) 


where  Rn(X)  =  a  (A)*  ©i?^  (A)  and  ©  is  the  element  by  element  product.  The  correspond- 
ing expression  for  the  conjugate  transpose  of  Uz  (A)  is 


u>z  (A)*  = 


e'^cvz,  (-A) 


e'^PcOz,  (-A) 


=  4,{e'^)u;y{X)*l^{-X)  +  R^{-X) 


(7) 


where  the  symmetry  of  Z^  (A)  is  used. 

The  discrete  Fourier  transform  approximation  for  Zt  introduced  at  the  beginning  of 
this  section  produces  an  asymptotically  equivalent  estimator  based  on  the  periodogram  of 
yt  and  an  unknown  filter.  It  is  convenient  to  define 


/i^((^,A)  =  Re   l^{-X)(t>{e'^)a{-\) 


and 


h{(l),X)  =Re   l^{-X)<P{e'^)    . 
By  substituting  for  equations  (6)  and  (7),  (3)  can  be  approximated  by 


kho)  = 


^n,yy 


(A)  /l^  (^,  A)  dX 


-1 


/       In,yy  (A)  h{(f),  X)dX. 

J— IT 


(8) 


It  is  shown  in  the  proof  of  Proposition  (4.2)  that  0(/io)  —  4^  =  Op[n   -^Z^)  and 


(t){ho)  -  <^o  = 


1  -1 


•■n,yy 


(A)/i^((^,A)dA 


J      In,ee  (A)  Ir,  (A)  dX  +  Op  {n'^'^ 


such  that  consistency  follows  from  ergodicity  and  the  fact  that 

E    f     In,ee  (A)  Ir^  (A)  dX  =  a^    f     /^  (A)  dX  =  0. 

J —TT  J —IT 

It  is  transparent  from  equation  (8)  that  ^(/iq)  is  infeasible  as  it  stands,  since  it  depends 
on  knowledge  of  the  true  parameter  values  and  the  correlation  structure  of  the  squared 
errors.  Feasible  versions  of  4>{ho)  will  be  discussed  in  Section  5  below. 

Under  the  assumption  that  the  weight  matrix  Re  [Z^  (—A)  </>  (e*'^)]  is  known,  the  as- 
ymptotic distribution  of  ^(/lo)  is  now  a  straight  forward  consequence  of  Lemmas  (A. 2)  and 
(A. 4).  The  next  proposition  summarizes  this  result. 


Proposition  4.2.  Let  (f>{L)yt  =  et  where  all  roots  of  (f)(L)  are  outside  the  unit  circle.  If 
et  satisfies  Assumption  (A-1)  then  for  ^  defined  in  (1)  and  4>{ho)  defined  in  (8)  we  have 
4){ho)  —  4>  =  Op  (n"-^/^)  and 


^(4>iho)-<Po)^N{0,a-^E) 


where  E  is  defined  in  (3). 


Proof.  See  Appendix  B  ■ 

The  remainder  of  the  paper  will  now  be  concerned  with  the  construction  of  a  semi- 
parametric  estimator  with  the  same  distribution  as  (f). 


5.  Adaptive  Estimation 

To  develop  a  feasible  efficient  IV  procedure,  it  has  to  be  established  that  h  (cf),  A)  = 
Re  [l^  (A)  (p  (e~*^)]  a^^d  h^  {(f>,  A)  =  Re  \l^  (A)  cf)  {&~''^)  a  (A)]  can  be  replaced  by  consistent 
estimates  without  affecting  the  limiting  properties  of  the  estimator.  A  semiparametric 
estimator  having  this  property  is  called  adaptive.  No  confusion  should  arise  between  this 
use  of  the  terminology  and  the  literature  on  feasible  local  minimax  estimators  such  as 
Bickel  (1982),  Kreiss  (1987),  Linton  (1993)  and  Steigerwald  (1994).  The  main  difference, 
apart  from  efficiency  issues,  is  the  fact  that  here  a  nonparametric  correction  to  the  crite- 
rion function  is  made  while  the  local  minimax  literature  makes  a  nonparametric  one  step 
Newton  Raphson  improvement  to  a  consistent  first  stage  estimator. 

Different  approaches  to  prove  adaptiveness  are  used  in  the  semiparametric  literature. 
Direct  calculation  is  used  in  Robinson  (1987,1988)  in  the  context  oiiid  models  and  partially 
linear  models  and  by  Hidalgo  (1992)  in  the  context  of  time  series  regression  models.  Newey 
(1991)  applies  similar  techniques  as  Robinson  (1987)  to  the  instrumental  variables  case  for 
iid  data.  Andrews  (1994)  develops  a  general  methodology  based  on  stochastic  equiconti- 
nuity  arguments  and  applies  it  to  the  partially  linear  framework.  Andrews'  approach  will 
be  used  here  to  break  the  proof  into  two  parts.  First,  it  is  shown  that  a  nonparametric 
estimate  h{^,X)  converges  to  h{(f)Q,X)  uniformly  with  probability  one.  The  second  step  is 
to  established  that  uniformly  in  a  shrinking  neighborhood  of  the  true  filter  h{(pQ,X)  the 
distribution  of  an  estimator  is  arbitrarily  close  to  the  distribution  of  the  estimator  based 
on  the  true  filter. 

This  argument  will  now  be  formalized.  Let  l^  :  [— 7r,7r]  — >  C^  and  4>  '■  ["TTj^]  ~^  C 
where  C  is  the  complex  plane.  Then  introduce  a  set  of  functions  H  defined  as 


n 


=  |/i:  [-7r,7r] 


/i  =  Re  L  (-A)  (^(e'^)    ;Re[Z^  (-A)]  ,Re  ^  (e'^)    e  C^  [-n ,  Tr]\  . 

(1) 
where  C'^  [— tt,  tt]  denotes  the  space  of  k  times  continuously  differentiable  functions.  Define 

the  L°°  Sobolev  norm  of  order  one  as 


1=     sup     11/ (A)  11+     sup 


AG[-7r,7r] 


A£[-7r,7r] 


I^W 
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where  ||.||  is  the  Euclidean  matrix  norm  defined  by  \\A\\  —  (trAA*)  '    .  Introduce  the  metric 
on  7^  as 

p{hi,h2)  =  ||i,/,,i  -  li,,2\\l  +  1101  -  ?f'2lll  ■ 

{HjP)  is  a  complete  metric  space.  If  (f)  is  given  by  (1)  and  /,/,  by  (5)  then  it  follows  from 
Lemma  (4.1)  that  l^  ((/>,  A)  G  C"  [-tt,it]  .  Therefore  h{(t>,X)  G  H. 

We  proceed  by  defining  the  estimator  for  h  (</>,  A) .  We  have  established  that  we  can 
obtain  a  consistent  estimate  4>  for  example  from  cf)  =  {Y_iY-i)~^Y-iY  or  from  its  frequency 
domain  analog  introduced  before.  Residuals  as  a  function  of  some  fixed  parameter  value 
(p  are  obtained  as  in  Kreiss  (1987)  from 


e« {<f^)  =  £t ((/'o)  +  (</>- 0o)  {yt-i, ■  ■  -^yt-p) 


(2) 


such  that  the  estimated  error  et  {(f))  can  be  decomposed  into  the  true  error  and  the  J-'t-i 
measurable  part  (0  —  (po)  {yt-i,  ■  ■  ■  ,yt-p)-  We  form  the  following  statistics. 


oikQ))  = 


n 


t=p+k+l 


dr, 


else 


where  the  sequence  d„  >  0  for  all  n  with  dn  =  O  {n  ^1'^+^)  for  some  0  <  z^  <  1/2.  The 
truncation  numbers  dn  are  used  to  avoid  "too  large"  values  for  a^^(^).  Truncation  was 
introduced  by  Bickel  (1982)  in  the  context  of  score  estimation.  More  closely  related  to 
our  context  is  Hidalgo's  (1992)  semiparametric  frequency  domain  estimator.  Simulation 
experiments  indicate  that  the  truncation  playes  no  role  in  practice  and  can  therefore  be 
ignored  in  applications. 

Next,  an  estimate  for  b^  =  (27r)~^  J^^  77  {4>,  A)  e^^^dX  is  needed.  The  vector  hk  contains 
the  impulse  response  function  of  the  AR{p)  model  evaluated  at  different  points.  Here  we 
want  to  express  b^  directly  as  a  function  of  the  underlying  yli?-parameters.  From  the 
definition  of  77  (0,  A)  in  (2)  and  the  expansion  4)~'^  {z)  =  Yl''p4>,j'^^  with  ip^j  =  0  for  j  <  0, 
6fc  can  be  written  as 


bk 


'^4>,k-p 


where  the  coefficients  ip^^j  satisfy  the  recursion  ip^^  —  (pi^p^^s-i  —  ■  ■  ■  —  (t>p4'4,,s-p  =  0  ^o^ 
all  s  >  0  and  tp^^Q  =  1  (see  Kreiss,  1987).  Let  ifJ^  denote  the  vector  of  the  first  p  +  1 
coefficients  of  the  polynomial  expansion  of  (f)~^  (z) .  This  vector  is  the  solution  to 


(3) 


1 

0  ■ 

"  ■0<^.o  " 

"  1  " 

<^1 

1 

^<t>A 

0 

<?^2 

-01 

1 

■ 

■ 

■    - 

.  ^*,P  . 

_  0  _ 
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which  is  denoted  by  t/)?  =  ^~^ei  where  ei  is  the  first  unit  vector  and  $  is  the  matrix 
defined  in  (3).  Then,  let  bp+i  denote  the  vector  of  coefficients  ^p^^2  ^o  '^4>,p+\-  Using  a 
pxp+1  selector  matrix  ^i  picking  the  last  p  elements  from  a  p  +  1  x  1  vector  we  have 


hp+i  = 


0      1 
0 


1 

<t>V 


Si^l  =  r^5i$-^ei 


where  we  define  T^  =  (/>!  if  p  =  1.  In  a  similar  way  we  obtain  bp+s  =  T^Si^   ^ei.    The 
vectors  h\  . .  .b^  can  now  be  expressed  as  functions  of  the  underlying  parameters  by 


bk  = 


5p,fcr("^"'t°''^-^1^5i$-iei 


where  the  convention  T9  =  Ip  is  assumed  and  the  selector  matrix  Sp^k  is  defined  by 

with  the  indicator  function  {.}  =  1  if  the  expression  inside  the  bracket  is  true.  The  Fourier 
coefficient  b^  is  continuous  in  the  underlying  parameters  for  all  finite  k  and  can  therefore 
be  consistently  estimated  from  a  consistent  estimate  ^. 
A  nonparametric  estimate  of  h  (cj),  A)  is  now  defined  as 

n—p—l 


and 


hr 


(^,. 


A    =Re 


(A)0(e 


-i\ 


(4) 


No  additional  kernel  smoothing  is  needed.  The  reason  is,  that  h  (</>,  A)  is  already  a  con- 
volution between  a  bounded  sequence  and  a  twice  continuously  differentiable  function.  In 
fact,  the  bk  implicitly  contain  a  bandwidth  since  for  every  (f)  inside  the  stationary  region 
they  will  decay  to  zero  quickly. 

We  will  also  need  the  following  matrix  h^{4',X),  whose  elements  are  continuous  func- 
tions of  hn{4'-,  ^)  and  which  is  defined  by 


h^^  (^,  a)  =  Re  [l^  (A)  ^  (e-^^)  a  (A) 


The  success  of  a  semiparametric  estimator  depends  on  the  ability  to  uniformly  estimate 
the  weights  aj^.  Additional  assumptions  about  the  moments  of  the  driving  error  process 
are  needed  to  assure  this.  Since  aj  depends  on  fourth  moments  such  conditions  necessarily 
involve  higher  than  fourth  moments.  Here  we  prove  uniform  convergence  by  a  mean  square 
argument  which  necessitates  summability  assumptions  on  eighth  moments.  The  following 
assumption  is  sufficient  to  prove  the  main  result. 
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Assumption  B-1.  Let  Cs.,.e  (ii, . . . ,  ifc-i)  he  the  k-th  order  cumulant  of  the  error  process 
et.  Then 

J]]  ■  ■  •  ^(1  +  \tj\)  \c,,„,  {ti,...,  tk-i)\  <  oo,  for  all  j  =  1, ...,  k  -  1  and  k  =  2, 3,  ..,8 

tl  t-k-l 

Assumption  (B-1)  implies  that  higher  order  cumulant  spectra  of  order  eight  exist.  This 
assumption  enables  us  to  state  the  following  result. 

Proposition  5.1.  Let  hn{4>n^  -^)  he  as  deRned  in  (4),  let  Assumptions  (A-1,  B-1)  hold  and 
assume  that  ^„  -^  (pQ  in  probability.  Then 


sup 

AG[-7r,7r] 


hn{(t>n^>^)-h{(t)Q,X)      =Op(l) 


as  n  ^  oo.  Also  P  (p  f /i„((/)„,  A), /i  ((/>o,  A) )  >^)    — >  0  for  any  (5  >  0  as  n  — >  oo  and 
P  [hn{(l>n->  ^)  ^'H]  -^  1  as  n  ^  oo. 


Proof.  See  Appendix  B  ■ 

We  proceed  to  define  the  semiparametric  estimator  ^„{hn)  by  replacing  Hq  =  /i((/)q,  A) 
with  a  nonparametric  estimate  (4) .  We  will  establish  that 

(5) 


(6) 
(7) 


Vn  (^^„(^n)  -  4>n  (ho))   =  Op  (1)  . 

By  applying  Lemma  (A. 2)  it  can  be  shown  that  ior  h  eH 

In,yy  (A)  h  ((/),  A)       =       In,yy  (A)  Rc  [l^  (A)  (/.(e-'^)a  (A)]   0 

+In,ee  (A)  h^^  (0,  \)  +  h  ((/.,  A)  i?„  (A) 

where  the  remainder  term  Rn  (A)  is  such  that  ^Jn  f  Rn{X)'^  (A)  dX  =  Op  (1)  for  any  contin- 
uous function  <;  (A)  with  absolutely  summable  Fourier  coefficients.  Let 

h^^  icf>,X)  =  Re  [l^  (-A)  4>{e'^)<t>^\e'^) 

such  that  h^^  {(pQ,  A)  =  Re  [l^  (—A)]  =  Ir/  (A)  and 


V  (^,  a)  =  Re  [i^  (-A)  4>{e'^)<Po'i<^'^) 


(5)  then  follows  if 


and 


£^  In,yy  (A)  (A^(^„,  A)  -  h^  (</>o,  A))  dX 
j^In,e.{X){h^^Q>^,X)-lr,{X))dX 

/l„(A)(/i„(^„,A)-/i(<^o,A))c!A 


=  Op{l) 

=      Op{l) 
=      Op{l). 


(8) 

(9) 
(10) 
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(8)  can  be  established  easily  with  the  help  of  Proposition  (5.1)  by  the  following  argument 
^"  /„,,,  (A)  (/i^(,^„,  A)  -  h^  (00,  A))  d\ 


<  sup 

Ae(-7r,7r] 

<  2     sup 

Ae[— tt.tt] 


/l^(0„,A)-^-((^O,A)  r    In,yy{X)dX 

J  —n 


/V(-A)^(e^^)-Z^,o(-A).^o(e 


,^A^ 


sup     ||a(A)||%j,(0)^0 

Ae[-7r,7r] 


where  the  first  inequality  uses  the  fact,  that  In,yy  (A)  is  a  positive  scalar  and  the  second 
inequality  uses  tr{ab  ba)  =  {a  a){b  b)  where  a  and  b  are  two  conformable  vectors.  The 
last  expression  goes  to  zero  by  (5.1)  and  the  fact  that  sup;^£t_ ^.^i  ||a(A)||  is  bounded.  To 
prove  (9)  we  work  with  the  metric  space  {H,p)  defined  in  (1).  Also  let  Hq  =  h{(j)Q,X)  , 
h  =  hn{4>n,  A),  0^  =  cP{e'^),  ct>^  =  Me'^)  and 

Vn  (h)  =  y/n  In,ee  (A)  {h^^  {(f),  A)  -  /,,,o)  +  Rn  (A)  (h  -  Hq)  dX 

J  —TT 

for  h  G  H.  Following  Andrews  (1994),  (9)  follows  if  for  any  given  ??,  e  >  0  there  exists  a 
(5  >  0  such  that 

limsupPf    Vn  (hn)  -  Vn  (ho)     >  1?) 

<  limsupPf    Vn  (hn)  -  Vn  (ho)    >  ■&,hn€H,p{hn,ho)  <6 

n—>(x>  ^  ^       ' 

+  lim sup  P  (hn  ^  7i  oi  p{hn, ho)  >  6 

<  hmsup  P  sup  \\vn  {h)  —  f„  (/io)||  >  i?  |   <  £- 

■^-"^  \h€n,p{hr,M)<i 

Since  we  have  established  in  Proposition  (5.1)  that 

lim  sup  P  (Jin^H  or  p  (hn,hQ]  >  6]  =0. 

n— too  V  ^  ^ 

Therefore  if 

limsupP  sup  ll^n  (/i)  — '"n  (^o)ll  >  'i^  I    <^ 

"■-°°  yft.e-H,p(/i„,/io)«5 

then  the  following  theorem  can  be  established. 


(11) 


Theorem  5.2.  Let  hn{^ny^)  ^  defined  in  (4).  Let  assumption  (A-1)  hold  and  let  ^„ 
be  a  previous  estimator  for  which  ^„  — >  (pQ  in  probability  or  almost  surely.  Then,  the 
semiparametric  estimator  4>(hn)  defined  by 


^  [hn )  = 


In,yy{X)hlQ)^,X)dX  /      /„,yy  (A) /i„(^„,  A)dA 

•TT  J  ./— TT 
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has  a  limiting  distribution  characterized  by 

^  (4,{hn)  -  4>o)  ^N{0,a-^E) 

Proof.  See  Appendix  B  ■. 

This  result  establishes  the  feasibility  of  a  semiparametric  estimator  that  improves  on 
the  efficiency  of  the  conventional  Gaussian  estimator  in  the  presence  of  higher  order  depen- 
dence. The  frequency  domain  representation  allows  to  avoid  estimating  the  instruments 
for  each  observation  in  the  sample.  Instead  an  optimal  filter  applied  to  the  periodogram 
of  the  data  leads  to  an  asymptotically  equivalent  procedure.  Moreover,  the  fact  that  the 
optimal  filter  itself  is  a  convolution  integral  in  the  frequency  domain  solves  the  problem 
of  truncating  the  approximation  of  the  optimal  instrument  at  a  given  lag  in  a  natural 
and  elegant  way  and  eliminates  the  need  for  lag  truncation  parameters  for  the  number  of 
instruments  used. 

6.  Monte  Carlo  Simulations 

In  this  section  a  small  Monte  Carlo  experiment  is  reported.  To  keep  the  exposition  as  simple 
as  possible  we  focus  on  an  AR{1)  model.  We  consider  what  the  efficiency  gains/losses  of 
the  IV  estimator  are  relative  to  a  correctly  specified  likelihood  procedure  and  relative  to 
OLS. 

The  following  questions  are  of  interest:  Under  what  circumstances  does  the  optimal  IV 
estimator  achieve  efficiency  gains,  how  big  are  they  relative  to  the  Gaussian  QMLE  and 
how  much  is  lost  by  not  specifying  the  true  likelihood.  These  questions  are  analyzed  for 
the  case  where  the  true  generating  mechanism  is  an  ARCH{1)  process. 

We  generate  samples  of  size  n  =  256,  n  =  512  and  n  =  1024  from  the  following  model 

yt  =  (fyyt-i  +  et  (i) 

where  et  is  generated  by  the  ARCH{1)  process  ej  =  Ufh/    where  ht  =  'Yq  +  7i£t_i  with 

Ut  ~  A''(0,1).  Starting  values  are  yo  =  0  and  eq  =  0.  Small  sample  properties  of  three 

different  estimators  to  be  defined  below  are  evaluated  for  different  values  of  0,  7;^  G  [0, 1) . 

It  is  clear  from  Milhoj  (1985)  that  asymptotic  normality  established  in  previous  chapters 

only  obtains  for  values  of  7^  €  [0,  -^1/3).  Nevertheless,  simulation  results  are  reported  for 

parametrizations  outside  this  interval  in  order  to  analyze  the  robustness  of  the  proposed 

IV  procedure  to  departures  from  the  assumptions.  The  parameter  70  is  fixed  at  .1  for  all 

experiments. 

The  parameter  (f)  is  estimated  by  three  different  estimators.    The  least  squares  esti- 
^  OLS 
mator  is  denoted  by  0„       =  Yl^=2ytyt-i/Y17=2yt-i-  ^^^  optimal  instrumental  variables 

estimator  is  obtained  from  the  consistent  first  stage  estimator  4>oLS  ^^ 


IT 


In,yy  (X)  h""  {4)QLS,  >^)dX 
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/       In,yyWH^OLS^^)d>^ 

J  —  -K 


where  h^  {(f),  A)  and  h{(p,  A)  are  computed  as  explained  in  Section  5.  If  the  data  are  gener- 
ated by  (1),  the  Ukelihood  estimator  (j)^      is  obtained  from  maximizing 

/(</., 70, 7i;n  =  -^E  (in  ^t  +  ^)  (2) 

with  St  =  yt  —  <t>yt-i  and  /it  =  7o  +  li^l-i-  We  use  the  BHHH  algorithm  described  in 
Engle  (1982)  to  maximize  the  likelihood. 

Figure  1 

Figure  1  shows  the  potential  efficiency  gains  of  the  IV  estimator  relative  to  the  Gaussian 
QMLE  as  a  function  of  the  autoregressive  parameter  </>.  The  efficiency  gains  are  computed 
from  the  asymptotic  covariance  matrix  when  the  generating  mechanism  is  (1).   More  ex- 

plicitly,  the  asymptotic  covariance  matrix  oi  (p^       can  be  expressed  as 

c^Ils  (</',7o,7i)  =  ^^-^  E'^'^^.+i  (3) 

where  a^  =  (70/I  -  7i)^  and  a,+i  =  27§7'i+V[(l  -  7i)^(l  -  37?)]  +  o"^.  The  asymptotic 
covariance  matrix  for  the  optimal  IV  estimator  can  be  obtained  from  (3).  It  is  given  by 

fv-i  "I       -1 


^/y  (<?!',  7o,7i)  = 


-^E^'V\ 


i=0 


(4) 


Figure  1  plots  cr^y  (</>,  .1,7;^) /cr^^^  (c;!),  .l,7i)  for  cf)  6   [0,1)  and  different  values  of  71. 

'  OLS 
These  theoretical  gains  are  contrasted  to  the  empirical  efficiency  of  the  estimators  4>n      , 

(j)^    and  4>n     based  on  3000  replications  for  sample  sizes  256,  512  and  1024.  The  results 

are  summarized  in  Table  1. 

Table  1 

As  expected,  gains  for  the  IV  estimator  are  achieved  for  models  where  the  autoregres- 
sive parameter  is  above  .5.  This  conforms  with  the  theoretical  analysis  based  on  asymptotic 
approximations.  For  the  sample  sizes  considered  here,  the  theoretical  efficiency  gains  are 
not  achieved  completely.  The  table  shows  that  the  relative  efficiency  of  the  IV  estimator 
improves  with  the  sample  size.  The  most  significant  increase  takes  place  from  size  256 
to  512.  It  is  also  interesting  to  note  that  the  IV  procedure  maintains  its  properties  even 
for  values  of  7^  >  \/l/3.  In  fact  the  gains  are  strongest  when  both  autocorrelation  and 
dependence  in  the  conditional  variance  are  strong. 

Figure  2  shows  the  empirical  densities  of  the  three  estimators  4)^  ,  4>n  ^^d  ^„  when 
no  ARCH  effects  are  present. 

Figure  2 

The  graph  confirms  the  information  summarized  in  the  tables:  The  three  estimators  are 

identical  under  iid  conditions.    Figure  3  shows  the  empirical  distributions  of  4>n      >  4'n 
and  (^„     for  a  sample  size  of  1024  when  </>  =  .9  and  7^  =  .9. 

Figure  3 
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Here  0„  clearly  dominates  the  two  other  estimators  in  terms  of  efficiency  and  mean 
and  median  unbiasedness.  The  IV  estimator  has  surprisingly  good  properties  even  though 
the  asymptotic  theory  used  for  its  construction  does  not  hold  for  this  set  of  parameter 
values.  ^^^^    ^^^ 

Table  2  contains  the  means  and  medians  for  ^„  ,  4'n  ^'^'^  ^n  when  n  =  512  based 
on  3000  replications. 

Table  2 

The  bias  tends  to  be  largest  for  the  IV  estimator,  but  the  difference  between  (/>„       and  ^„ 
is  smaller  than  the  difference  of  the  former  with  4>^    .  The  bias  for  0„       and  ^„    increases 
with  (f).  For  a  fixed  (p,  it  is  largest  when  7^  =  .5.  The  bias  of  the  ML  estimatoi,  on  the 
other  hand,  is  little  affected  by  the  parametrization  of  the  model. 

7.  Conclusions 

This  paper  develops  efficient  IV  estimators  for  autoregressive  models  with  martingale  in- 
novations. Popular  parametric  examples  of  such  innovation  processes  are  ARCH,  GARCH 
and  stochastic  volatility  models.  A  vast  empirical  literature  documents  the  presence  of 
these  effects  in  many  macroeconomic  and  financial  time  series. 

The  paper  shows  that  estimation  of  the  autoregressive  parameters  by  standard  Gaussian 
ML  techniques  leads  to  inefficient  estimators.  An  important  result  in  Kuersteiner  (1997)  is 
that  GMM  estimators  based  on  lagged  instruments  improve  efficiency  and  therefore  dom- 
inate OLS.  It  is  shown  how  to  construct  the  best  GMM  estimator  based  on  instruments 
that  are  linear  in  past  observations. 

A  common  problem  of  GMM  procedures  based  on  a  large  number  of  instruments  is 
their  bad  small  sample  performance.  This  is  mainly  due  to  estimates  of  a  high  dimensional 
weight  matrix.  We  introduce  a  novel  decomposition  of  the  weight  matrix  which  leads  to  an 
orthogonahzation  of  the  instrument  space.  This  decomposition  has  the  advantage  that  it 
can  be  computationally  efficiently  implemented  by  using  FFT  algorithms.  Moreover,  the 
estimator  does  not  require  a  bandwidth  choice  for  the  number  of  instruments  used  and  is 
therefore  straight  forward  to  use  in  practice. 

The  small  sample  properties  of  the  procedures  developed  are  very  promising.  The  esti- 
mators are  equivalent  to  conventional  OLS  even  when  the  innovations  are  iid  and  strictly 
dominate  OLS  when  the  innovations  are  conditionally  heteroskedastic.  These  results  hold 
for  sample  sizes  as  small  as  250  observations  and  are  therefore  relevant  for  macroeconomic 
time  series. 


17 


A.  Appendix  -  Lemmas 


Lemma  A.l.   Under  Assumption  (A-1)  the  following  statements  hold: 

a)  for  each  m  6  N+\  {1}  ,  m  fixed,  the  vector  -^  Yl^=i  i^t^t-i,  ■■■,£t£t-m]  =^  N{0,Q)  with 


n  = 


a{l)+a^ 


(T{m)  +  (7^ 


Leram.a  A. 2.  Let  In,yy  (A)  be  the  periodogram  of  {yi,. . .  ,yn}  and  /„_££  (A)  is  the  peri- 
odogram  of  {ei  ,...,£„}  ■  Assume  et  satisfy  Assumption  {A  —  1)  and  that  yt  =  YI'jLq  i^j^t-j 

with  spectral  density  ^gyyiPot^)  such  that  Yl'jLo  |'0j|  \j\  <  o°-  Let  <;  (.)  be  any  continuous 
function  on  [— IT,  tt]  —^M.  with  absolutely  summable  Fourier  coefficients  {zk,—oo  <k  <  oo}  , 
then  for  any  77,  e  >  0 

P(y^  J    In,yy  (A)  <;  {\)  dX  -  ^  J    /„,,,  (A)  gyy{pQ,  X)q  (A)  dX 


>r/     <e 


as  n 


00. 


Proof.    The  proof  is  the  same  as  the  proof  in  Brockwell  and  Davis  (1987),  p.   388,  with 
minor  modifications.  ■ 

Lemma  A. 3.   (Billingsley)  Let  Xmn,  Yn  be  random  variables  defined  on  {Q,,J^,P) .  Sup- 


pose that  for  each  m  Xr, 
further  that 


1  —^  Xm  as  n  — >  00  and  that  Xm  —^  X  as  m  -^  00.  Suppose 
lim    hmsup  P {\Xmn  -  i^n|  >  e}  =  0 


for  each  positive  e.  Then  Yn  ^>-  X  as  n  —f  00. 

Lemma  A. 4.  Let  In,yy{X)  be  the  periodogram  of  {yi,. . .  ,yn}  and  /n,ee  (A)  is  the  pe- 
riodogram of  {£!,...,£■„}.  Suppose  the  et  satisfy  Assumption  {A  —  I)  and  that  yt   = 

Y^'jLoi'j^t-j  with  spectral  density  ^gyyiPoA)  such  that  Yl'jLo  l^il  bl  <  °°-  ^^^  "^(O  ^^ 
any  continuous  even  function  on  [—n,Tr]  — >  M  with  Fourier  coefficients  {zk,  —00  <  k  <  00} 
such  that 


fc=i 


and  f^^  q  (A)  QyyiPo,  X)dX  =  0,  then 

In,yy  (A)  ?  (A)  dA  A  iV      0,  4  ^  Qfet^ 

T  \  t,_i  , 

with  bk  =  Ejl-oo  7yy  {k  -  j)  Zj. 


fc=i 
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B.  Appendix  -  Proofs 

Proof  of  Lemma  4.1  First  we  show  that  /„  (A)  €  Li  [—tt,7t]  which  follows  from  J^   \fa  (A)|  dX  = 
Y^°^_      —  — T    <  oo-  Next  note  for  a  typical  element  k 

J  —n 

/jr     oo       oo  oo       oo 

-•^  j=0  /=-oo  j=0  l=-oo 

oo        I  oo        I  ^       /  oo  oo 


gJA(j+fc) 


such  that  the  result  follows.  ■ 

Proof  of  Proposition  4.2  Using  equation  (4)  and  the  definition  for  h^{(f).,  A),  /n,2y  (A) 
can  be  written  as  Re[In,zy  W]  =  In,yy  (A)  Re[h^{(f),  A)](?!)  +  /^  (A)  In,ee  (X)  +  Re[Rn  (A)]  where 

+4>{e-'^)li.  (-A) u;,  i-X)  Rl^  (A)  +  l^  (-A)  |<^(e'^)|'  |i?^_^  (A)|' 
+i?i,^(-A)i?2(A) 

and 

a;,(A)  =  rne"^>e(A)+<^(A). 

The  two  remainder  terms  R^  ^  (A)  and  R^  (A)  are  defined  next.  First  define  the  se- 
quences Tp  =  {ipQ,ip-y,...}  and  Tpik)  =  {^o/<^fc! V'l/^i+A:)  •••}•  Then  let  the  inner  prod- 
uct i?^  g  (A)  =  -^Y^'jLoSj^~^^''Un,j  W  be  defined  for  a  sequence  s  =  {so,si,...}  where 

Un,j  (A)  =  Yl7=i-j  £te*'^*  —  Y17=i  ^te'^*.  We  can  now  write  the  remainder  term  R^  (A)  in  a 
compact  form 

Rl  (A)  =  -l^  (A)  0(e-^^)Hi,^(A)  +  a  (X)*  ©  i?;^_- (A) 

where  R^  -  (A)  =  [i?-^  ^(uW'  ...,R^  r,  ,(A)]  such  that  i?^  (A)  is  a  p  x  1  vector.  This  leads 
to  '  ' 

Re[In,zyiX)]dX    =      /    In,yyiX)Re[h%(j),X)]dX(f> 

+  /"  Ir^  (A)  /„,..  (A)  dA  +  r  Re[i?„  (A)]dA.  (1) 

J —ir  J —TV 

Alternatively  one  has  also  Re[In,yz  (A)  a  (—A)]  =  In,yy  (A)  Re[/i^((?!>,  A)]  -I-  Re[i?^  (A)  a(— A)] 
with  RP,  (A)  =  (l)-\e-^^)uj,  (A)  Rl  i-X)  +  Rj,  (A)  R^  (-A) .  Now  if  /Re  [i?^  (A)]  <r(A)dA  = 
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Op  (n   ^/^),  where  <;(A)  has  absolutely  summable  Fourier  coefficients,  it  follows  that  (f)  = 
(f)  +  Opin^^f^)  and 


Vn[j)-(j)j 


In,yy{X)Re[h''{cj,,X)]dX 


-1 


x-v/n 


r  Ir,  (A)  In,.e  (A)  c?A  +   T  Re[i?„  {X)]d\ 

J —  TT  J —TV 


Since  l^  (A)  G  C'^  [-7r,7r]  and  (?!>(e^'^)  G  C'^  [-tTjTt]  it  follows  from  Lemma  (A. 4)  that 

/•IT  /  OO  A 

V^J     ^^(A)J„,eaA)tiA^7V     0,^a,(z,(A),e'^^)(z,(A),e'^^)       , 
where  the  matrix  Ym^i  '^i  {^n  W  >  s'^')  {^n  (A) ,  e*'^')   has  typical  element 


1=1 


J^ai{l^iX),e'^')(lr,{X),e 


i\l 


=  J2'^J+kvl'l'4-,J%,J+\k-l\- 
k,l        j=0 


Using  Lemma  (4.1)  it  is  easy  to  show  that 


5^a,(/,(A),e^^A(z,(A),e'^^- 


faiX-Ov{<l>,Ov{^,X)  d^dX  +  -,  /     77  a  A)  77  (<^,  A)  dX 


a'* 


The  result  then  follows  if  ^/n  X^tt  -^  (A)  dX  =  Op  (1) .  We  discuss  the  four  different  types 
of  remainder  terms  separately.  Since  Z^  (A)  and  Z^  (A)  (j){e^^)  have  absolutely  summable 
Fourier  coefficients  and  are  uniformly  bounded  on  [— 7r,7r]  the  proof  of  Lemma  (A. 2)  can 
be  applied  to  the  remainder  terms  involving  R\  ^  (A)  alone. 

To  show  that  -Jnj^^(f)~^{e^^)u)^{—X)R'^{X)dX  =  Op(l)  it  is  enough  to  show  that 
element  by  element 


Ey/n 


0. 


[%-'ie^^)u,{-X)Rl,,{X)dX 

J  —IT 

Using  the  definition  of  /?^  (A)  one  has 

EV^     r  r'  (e^^)'^e  (- A)  Rlf,  (A)  dX 

J  —  IT 

<     E^     r  l^^k{X)ct>'\e'^)<p{e-'^)u,{-X)R\^^{X)dX 
J —IT 

£  r'  (f")  -.  (-A)  n- V2e-A'=iii  -  ^^^  (A)  dX 


+Ey/n 
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The  first  term  can  be  analyzed  by  setting  q  {X)  =  l^  (A)  (^   ^  (e*"^)  (f)  (e  ''^)  and  applying 
the  proof  of  Lemma  (A. 2).  Next  look  at  a  typical  element  k  of  the  second  term 


EV^ 


i^i 


^     oo     oo      J       n       ,, 

/  E  E  E  E  S^'.  (='-  -  -"--)  --"<'-'-'*"<iA 

■^ -^  j=0  1=0  r=l  t=l  "'=+J 


Then  for  j,  Z,  and  r  fixed  such  that  l<r  —  k  +  l<n 


E 


1/2  1/2 


Now  summing  over  j,  m  gives 

i,  U^)  u>,  i-X) n-V2  Y^ ^.e-'^Wnj  (A)  ?  (A)  dX 
This  holds  true  for  all  k  =  l,...,p.  Next  consider 


fc+/    '    '-"j-n-k+l 


<sup24/vv2j;j: 


j=0  /=0 


^,-^/ 


"fc+j 


lJl-0. 


+Ey/n 


r  Rli-X)Rl,{X)dX 

J—TT 


Vfc(A)0(e-'^)|i?i(A)|'dA 


r  R\  (-A)  n-1/2  V  Ae-^(^+'=)C/„,,  (A)  dA 

V-TT  ^   OLj  +  k 


such  that  the  first  term  goes  to  zero  by  the  proof  of  Lemma  (A. 2).  The  second  term  then 


is 


E^ 


OO  / 

Ri  (_A)  n-1/2  y  J^e-Mi+fc)c/^  .  (A)  dX 


<    En 


<    n-^l'' 


<     n 


1=0  j=0  "j+'= 


<    n-V2 


/TT 
-TT 

EE 

■'^  /=o  j=0 

/■TT      OO  .  \    1/2     "" 

/TI"      CO  OO 

ElV'dmin(/,n)^/2E 
'^  Z=0  j=0 


(fA 


E\Un,l{->^)\\UnjW\dX 


(^it^n,,(A)r 


1/2 


dX 


0!j+k 

mm{j,n)^^^dX^O. 
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The  remaining  terms  can  be  shown  to  go  to  zero  by  the  same  arguments  and  the  proofs 
are  omitted.  This  completes  the  proof  of  the  proposition  ■ 

Lemma  B.l.  For  a*  (</.)  =  ^  Z7=i+p+i  £ti<t>)^t-M,  ""^  =  Ea*  {4>)  with  e^cp)  defined  in 
(2),  £t  satisfying  Assumptions  (A-1)  and  (B-1)  and  any  fixed  S  <  oo  it  follows  that 

maxt;ar(n^/^     sup      |a,*  {(())  -  af^\)  =  0(1). 

Proof.  Let  yt  =  [yt,  ■■■,yt-p\  and  </>  =[l,(t)']'.  Then 

m,^  =  vec{(l)cj)')'E{ytyt®yt-iy't-i)vec{(j)(j)') 
and  a"^  =  ai^^{n  —  I  —  p)/n.  Letting  4>q  =  1  we  can  write 

et  {(pf  Et-i  ((P)'^    =    vec{ct)(f)'y{yty't0yt-iy't_i)vec{(t)(f)') 
p  p 

=     ^■■■'Yl^i^'"  ^giyt-qiyt-q2yt-l-q3yt-l-q4- 

gi=0         94=0 

Now  define  fi'n,y...y{qi,  -,94,0  =  n'^  Y17=i+p+l  yt-qi  '  --yt-i-qi  -  Eyt-q^  ■  --yt-i-q^.  Then 

sup      \ai  {(f))  -  ai^^l 
p  p 

<  sup  '^■■■Y^\^g^---(J}g^\\il'^yy{qi,...,qi,l)\.  (2) 

<l>^N6{<l>o)  q^=Q  54=0 

Expression  (2)  can  be  bounded  by 

/  s  4    p  p 

fmax(|(?!)i_o|  +  ^)  J    XI  ' ' '  XI  |An,2/...2/(9i>  •••>94,0| 

^     '  ^91=0         94=0 

so  that  it  remains  to  show  var  |An,y...2/(9ij  •■■)94)0|  =  0{n~^)  for  all  /.  Now  var  \jj!n^y„,y\  is 

n  n 

"~^       X  X       COv{yt-q^  ■  ■■yt-l-q^,ys-qs  ■  --ys-l-qs)- 

t=l+l+p  s=l+k+p 

We  can  substitute  for  the  definition  of  yj  =  ^  '4'j^t-j  which  leads  to 

n  n  oo  oo 

"~^       X  Yl        X  ■  ■  ■  X  ^Jl  ■  ■  ■^J8^°^(^*-9l-Jl  ■  ■■^t-l-qi-H,£s~qs.-h  '  '  '  ^s-k-qs-js)  ■ 

t=l+l+ps=l+k+pji=0        J8=0 

(3) 

The  covariances  can  be  represented  by  eighth  and  lower  order  cumulants  by  considering 
the  following  matrix 


X{t,sJ,qi  -ji,...,q8-J8)  = 


^t-qi-jl       ^t-q2-h       ^t-l-q3-J3       ^t-l~q4-J4 
^s—qz—jb      ^s-qe—je      ^s-l—qr-jr      ^s-l-qs—js   . 
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with  typical  element  Xij  where  reference  to  the  time  indices  is  suppressed.    Then  from 
Brillinger  (1981),  Theorem  2.3.2, 

4  4 

coz;(JJ Xi , J,  JJX2j)  =  X]  n  ctim(Xij,z,j  G  Vs) 

j=l  j=l  V    Vs€v 

where  cum{Xij,i,j  €  Vs)  is  the  joint  cumulant  of  all  the  Xij  with  indices  i,j  S  Vs  and 
the  sum  is  over  all  indecomposable  partitions  v  of  the  table 


(1,1) 
(2,1) 


(1,4) 
(2,4)  • 


A  definition  of  indecomposable  partitions  is  given  in  Brillinger  (1981),  p.20.  We  note  that 
all  first  cumulants  are  zero.  Second  cumulants  are  nonzero  only  if  the  time  indices  of  et 
coincide.  By  indecomposability  of  partitions  there  is  at  least  one  cumulant  with  elements 
from  both  rows  in  X  in  each  product  Ylv  £v^'^''^i-^i,j^'''i3  ^  ^s)-  ^Y  strict  stationarity 
Es  Et  co'u(n)=i  ^i,j,  11^=1  ^2,j)  =  Es  cou(nl=i  ^l,j,  nj=i  ^2j)  where  the  time  in- 


n 


-1 


dices  for  t  have  been  normalized  to  zero.  The  summability  assumption  (B-1)  implies  that 
Es<^^''^(nt=i  ^ij,n7=i-^2,j)  <  oo  uniformly  in  I,  j\,...,js-  This  shows  that  the  term  (3) 
is  of  order  n~^  uniformly  in  I  as  had  to  be  shown.  ■ 


Proof  of  Proposition  5.1  For  the  first  part  of  the  proposition  we  have  to  show  that 
for  any  ry,  e  >  0 

(0„,A)-/i(0o,A) 


lim  P{    sup 


Ae[-7r,7r] 


hr, 


>  77)  <  e. 


This  holds  if  there  is  a  ^  and  a  neighborhood  A^^  {(J)q)  of  (pQ  such  that  A^^  ((^g)  is  contained 
in  the  interior  of  the  stationary  region  of  the  parameter  space  and 


lim  P(    sup        sup        hn{(l),X) -h{(po,X)     >  rj)  +  lim  P{(j)  ^  Ng  {(f)o))  <  e. 

Consistency  of  4>  implies  P{(p  €  A^^  {4>o))  — >  1  so  that  only  the  first  term  needs  to  be 
considered.  From 


hni(f),X)  -  h{(f)Q,X) 


Re 


f^(A)</>(e-^) 


Re 


< 


hfi{X)Ue-^^) 
1 


-iA> 


l^  (A)  </.(e-'^)  -  /^,o  (A)  M^~n 


+ 


i^(-X)^{e'^)-l^^o{-X)Me'^) 


and 


< 


l^{X)cp{e-^^j-l^,o{X)Me-'^) 

kW-h,o{X)     <t>{e-^^)   +||Z^,o(A)||  0(e-^)-(/)o(e-^^) 
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it  is  enough  to  show  that 


sup     sup 

A    4>€Ns{4>o) 


li,  (A)  -  l^fi  (A)    =  Op  (1) , 


while  \(l){e'-^)  —  (f>Q  (e''^)|  <  57/2  on  Ns  {4>o)  by  uniform  continuity  of  ^q  {^^^)  o^i  [-t'",'?'']  ■ 
To  estabhsh  sup^^sup^^g^y^^^^^^    Z,/,  (A)  —  /^,o  (A)     =  Op  (1)  it  is  enough  to  look  at  a  typical 

^■ip,k  (A)  —  ltp,kfi  (A)   -  Now  let  ctj  =  Ee1e1_j,  bj^k  denotes  the  A;-th  element  of  bj  and 


element 


aj^tj)  =  Eel{4i)el_A(j)).  Then,  using  the  definition  of  l^^k  (A) 


sup         sup 

Ae[-7r,7r](;6eiVi(<^o) 


'i/',fe  (A)  —  Z^,fc,o  (A) 


sup     sup 


<    sup     sup 


n—p—1  oo 

j=i  j=i 

n—p—l 


Yl  (^j(^)    ^hk,4>-(^j,lhk,4>  +  (^jibj,k,ct>-aj^bj^kfi)e  '^^ 


j=i 


(4) 


+  sup 

A 


j>n—p 


We  note  that  sup;^ 

Next 


Er>n-p  cq\ks>e-''^    <  sup,,  afn-'/'  T.T>n-pj"'  \hkfi\  =  o{n-'''). 


n—p—\ 


< 


n—p—l 


+ 


n—p—l 

E  {oc-\bi^k,^-b,,kfl))e-''^^ 

3  =  1 


(5) 


(6) 


It  is  therefore  enough  to  show  that  (5)  and  (6)  go  to  zero  uniformly  on  [— 7r,7r]  as  6  — >  0. 
First  consider  (6). 


sup 

Ae[-7r,7r] 


<       sup 

Ae[-7r,7r] 


n—p—l 


{(f)Q,n))dn 


1 

+-4     sup 

(^     A£[-7r,7r] 


n—p—l 


(7) 


24 


where,  for  cf)  E  Ng  {(po) ,  the  finite  Fourier  approximation  of  77  {(f>,  ^)  converges  uniformly 
such  that 


sup 

Ae[-7r,7r] 


=        sup 

AG[-7r,7r] 


n—p—1 


yZ    ihk,4>  -  ^j,kfi)  e' 


■i\j 


J  =  l 


(77  {cp,  X)-fj  (</>o,  A))fc  -    Y,    ibj^k,4>  -  bj,k,o)  e-'^' 


]=n—p 


<        sup     \7]{4>,X)-f]((f)Q,X)\f.  +  2     rap       V    \bj^k,4 


AGf-TT.TT 


<     e  +  2     sup       ^    \bj,k,<p\ 

4>eNs{4>o)j=n-p 

Then,  letting  ctj  =  a~^  —  a~^  the  first  term  in  (7)  is  dominated  by 


<        sup 

Ag[— 7r,7r]  J —TT 


n—p—1 

j=i 


\Vk{4>,iA  -'?fc(0o>M)M/^ 


n—p—1 


< 


j=—n+p+l 

<    Ce{5). 


„_, ,  1  J—n 


n—p—1 


The  constant  C  is  bounded  by  X^,=i 


ai 


j  Wo) 


<  Ejli  Qj  (-^o) 


continuity  of  7)  (cj),  fj)  on  [— vr,  tt]  x  A^^  ((/)q)  we  have  \ri  {(p,  fi)  —  f]  {(f>Q,  ^) 


<  00.  From  uniform 
[.  <  e  for  some  5  >  0 


from  which  it  follows  that  J_^ 
bounded  by  (27r^°l^  dj  ((/>o)~ 


ri{(j),fx)  —  f]  ((f)Q,fj,)\i^dfi  <  2'ne.  The  constant  C  can  be 
+  0-4)  independent  of  n  and  sup^g^^(^^)  Ejln-p  l^j.fc.^l 


goes  to  zero  as  n  — >  00.  Next  we  consider  (5).  First  look  at 


Pj>-^jl 


From  aj_0  =  E[4>{L)yt)^ {(f){L)yt-j)'^  it  follows  that  a^^^  >  a^  >  0  since  otherwise  (j){L)yt  = 
0  a.s.  But,  since  (f){L)yt  =  (f){L)(f)Q  {L)et  is  an  ARMA{p,p)  process  with  parameters  0q 
and  (j),  4>{L)yt  has  nonzero  variance  contradicting  (f>{L)yt  =  0  a.s.  Then 

(«j,0)~^  -  (^J^    <  Ci  \aj,4,  -  aj\ 

for  some  constant  Ci.  Since  Eej  <  00  we  can  uniformly  bound  \aj^^  —  aj\  by  6C2  where 
C2  is  a  finite  constant  depending  on  (f>Q  and  Eef.  Then 


n—p—1 


n—p—1 

j=l 
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where  Y^=i      \^oM,<t'\  <  oo  on  Ne  {(po)-  Now  turn  to  the  first  two  terms  of  (4) 


n—p—l 

n—p—l 

<        sup       Y^    \aj  {(f))''^  -  aj^^-^ 


^j,k,4>\ 


Now 


"j  (</•)   ^  -  ",i   <  ("j  (0)  a^-,,^)   1  |aj  (</))  -  aj, <^|  and 


J'+P 


l^i  {4>)  -  ajj  <  \aj  {(f))  -  a] J  +  — —  |aj- ^ 


n 


where  a^  .  =  "  ^  ^^^.(^  is  the  expected  value  of  a*  ((^) .  Then 


n—p—l  n-p—l 

w— p— 1 

n 


First  note,  that  since  Qj^^  is  bounded  away  from  zero  and  aj  {(f>)~  <  cn^/^~'",  we  can 
replace  {ctj  ((f))  aj^^)"^  by  n^^'^~'"  times  a  constant  that  does  not  affect  the  argument.  Then 
the  second  term 

n—p—l 

^-i/2-i;    ^  {j  +p)  \aj^^\  \bj^k,4>\  -^  0  as  n  ^  CO 

3=1 

uniformly  on  Ng  ((^q)  •  The  first  term  now  is  shown  to  go  to  zero  in  probability  by  looking 
at 

P{    sup    ni/2-  Y"-^~  \^J  i^)  -  ^U  \hk,4>\  >  n) 

The  term  n}''^~^  X]?=f~     ^i  i^)  ~  ^?<4   l^j,fc,</>l  ^^  dominated  by 


^n-p-l .     ^ 


n—p—l 


The  last  term  is  zero  with  probability  tending  to  one  if  maxj  P{sup^^j^^/^  \  a^  -  <  d„)  — >  0 

which  follows  form  maxj  P(n-^/^  sup j,g^(0  )('-''*i  d>  ~ '^1  d>^  >  ry)  — >  0.  By  Markov's  inequality 
the  first  term  is  bounded  by 


n  s—^n-p-l 


n"'     sup      |a*(<^)-a^^^| 


4>&Ns{ct>o) 


1/2 


<     ^^YT.   ^      \hk,<l>\\^ar{n^^^i     sup      \a*  {^)  -  al^\)) 
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The  result  then  follows  from  ^  YTjZl  ^  l^j.fc.^l  =  0{n-'')  if 

nmaxfar(     sup      |q;*  (0)  —  a"^|)  =  O  (1)  . 

This  is  shown  in  Lemma  (B.l). 

It  remains  to  show  that  P(p(/i„(^„,  A), /i(<;6o)'^))  >  ^)  ^^  0.  Using  the  result  from  the 
first  part  it  is  enough  to  show  that  sup;^g[_^  ,r]  dhn{4>ni  ^)/dX  -  dh  (</)q,  A)  /dX  =  Op  (1) . 
Since 

implies  that  f]{(l),X),^  e  C^[-7r,n]  V/i  <  oo  and  d^r]{(j>,-T^)^ /dX^  =  d^i]{(i),T:)^ /dX^  it 
follows  from  Folland  (1984),  Theorem  8.22e,  that  {ij^hj  {(j))  =  J  (^T]((f),X))  e-'^^dX.  By 
Bernstein's  Theorem,  ^\j\  ~    \bj  {(p)\  <  oo  such  that 

^(^j)6,(</>)e'^^■->^r,(</>,A) 

uniformly  on  [— 7r,7r]  .  Using  these  facts,  and  noting  that  lr]{X)  €  C^  [— vr,7r]  ,  the  proof  of 
the  first  part  can  be  applied  to  the  first  derivative. 

Since  {Ti.,p)  is  a  complete  metric  space  P  (p  ( /i„(^„,  A),  ^((J^iq,  A)  ]  >  6)  ^  0  implies 
that  PChn  eH)  ^i.m 

Proof  of  Theorem  5.2  We  start  by  obtaining  an  expression  for  Rn  (A)  in  (6).  From 
the  proof  of  Lemma  (A. 2)  we  have 

oo 

^y  (A)  =  ct>o'  (e~'^)  ^.  (A)  +  n'^/^  ^V'^-e-^^^'C/^,-  (A) 

j=0 

Letting  R^  (A)  =  n.-^/^  Yl'^o  ^je'^^^Unj  (A)  we  can  write 

In,yy{X)      =      UJy{X)Uy{-X) 

=    ^y  (A)  ^y  (-A)  a  (A)  (/.q  +  Uy  (A)  u,  (-A)  +  cjy  (A)  0o  (e'^)  <  (-A) 

=      In,yy  (A)  a  (A)  4>o  +  In,ee  (A)  <Po'  [^"^)  +  ^e  (-A)  hi  (A) 

+u,{X)-p^Ri{-X)  +  \Ri{-X)\' 
00  (e  ^-^j 

In  the  same  way  we  also  have 

In,yy  (A)      =      In,yy  (A)  a  (-A)  <^o  +  ^n,..  (A)  <t>o^  (e^^)  +  o;,  (A)  R],  (-A) 
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Let 


Rn  (A)  =  Re 


a;,  (A)  i?i  (-A)  +  ^,  (-A)  %^i?i  (A)  +  |i?i  (-A)I' 


Next  write  In,yy  (A)  h^  {cpQ,  A)  for  the  filter  based  on  the  true  parameter  (pQ  as 

In,yy  (A)  /iQ  (</>0,  A)      =     In,yy  (A)  Re    l^^Q  (-A)  (pQ  [e'^j  a  (-A)    (po 

+In,ee  (A)  Zr,,0  (A)  +  /iQ  (</'o,  A)  i?n  (A) 

while  a  filter  based  on  a  different  parameter  </>  is  expressed  as 

In,yy{\)h{(p,X)     =     In,yy{X)Re    l^  {- X)  (j)  (e'^^  a  {- X)    (t>Q 

+In,ee  (A)  h^,  (0,  X)  +  h  (^,  A)  Rr,  (A) 

with  /i0|j  (<?!),  A)  =  Re  [l^p  (-A)  (j){e'^)(t)Q^  i^''^)]  ■  Then  by  the  proof  of  Lemma  (^.2)  it  follows 
that  ^/n  j  Rn  (A)  <;■  (A)  dX  =  Op(l)  for  all  continuous  <;  (A)  with  absolutely  summable  Fourier 
coefficients.  Next  (11)  is  estabhshed  if  for  all  i?,  e  >  0  there  exists  a  5  >  0  such  that 


limsup  P  sup         ||f„  {h)  -  Vn  (/io)||  >  'J?     <  ^ 

ra—oo  \h&n,pihM)<6  J 


which  in  turn  follows  from 


lim  sup  P  sup         ^/n 

n->oo  \h&H,p{h,ho)<6 


[     In,esW{h4,,{(f>,X)-lr^fiW)dX 

J  —  IT 


<e 


and 


lim  sup  P  sup         ^/n 

n-xx,  \hGn,p{h,ho)<6 


Rn  (A)  [h  -  ho]  dX 


<£. 


(8) 


(9) 


For  (9)  define  the  open  neighborhood 

ns  =  {h:  [-7r,7r]  ^RP  \h  en,p{h,ho)  <  6}  . 

Now  for  all  h  €  TCs  it  is  the  case  that  h  (A)  €  C^[— 7r,7r].  By  Bernstein's  theorem  this  implies 
absolute  summability  of  the  Fourier  coefficients  of  h.  In  turn  h  —  ho  is  also  continuous  and 
uniformly  bounded  with  absolutely  summable  Fourier  coefficients  such  that  (9)  follows 
from  the  proof  of  Lemma  (A. 2). 

Next  turn  to  (8) .  The  main  idea  of  the  proof  is  taken  from  Robinson  where  integration 
by  parts  is  used  to  separate  h  from  In^ee  (A)  - 


sup  y/n 

h&Hs 

<      sup  y/n 

h&Hs 


+  sup  Vn 


r  In,eeW{h4,,{<P,X)-lr^fiW)dX 

J  —TV 

r  {In,,e  (A)  -  EIn,ee  (A))  {h^,  (</>,  A)  -  lr,,0  (A))  dX 

J  —IT 

EIn,ee{X){h<t>,{(l),X)-lr,fiiX))dX 
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Since  EIn,ee  (A)  =  cr^  it  follows  that  the  last  term  is  a'^  J^^  (h^^  {(p,  A)  -  Irjfl  (A))  dX.  Now 
note  that  J^^  Ir^fi  (A)  dA  =  0  and  /^^  h,p^  {(f),  A)  dX  =  0.  Next  use  integration  by  parts 


sup  y/n 
heUs 

<      sup  \/n 


+  sup  y/n 
h&He 


r  {In,,e  (A)  -  EIn,ee  (A))  (V,  (</),  A)  -  /^,o  (A))  dA 

J  —  IT 

TT      Q  r\ 

-Q^  (V  (<^'  ^)  -  ^'/.O  (A))    /       {In,ee  (m)  -  £^/n,e£  {lA)  df^^X 
{h^o  {4>,  Tt)  -  /^,0  (tI"))    /       (-^n.ee  (A^)  "  -E/„,ee  (ju))  C?A 

J  —-K 


(10) 


Now  for  /i  ein  Tis  both  ||^  (/i^^,  (<?!',  A)  -  lr,fl  (A))||  and  ||/i^g  (</>,  tt)  -  /j,,o  (7r)||  are  uniformly 
bounded  by  C6  for  some  constant  C  <  oo  such  that 


sup  y/n 
hens 


PIT        Q  /-A 

/       'dX  ^^'^°  ^^'  '^^  "  ^'''°  ^^^"^    /       ^"^"'^^  '■^^  ~  E^n,6e  (/^))  C^MC^A 


d^ 


dX. 


It  remains  to  show  that  y/n  J^^  J_^  {^n,e£  (/^)  —  EIn,ee  (fJ-))  d^  dX  is  bounded  in  probabil- 
ity. Let  /^gg(^)  =  /„_££  (/i)  —  Eln^ee  (yu)and  define  the  function 


t{X,h) 


1     fj.<X 
0    /Lt>  A 


Since  /„,££  (yu)  =  In,ee  (m  +  71")  we  work  with  y/n  J^''   /q  (/„,ee  (f^)  -  Eln^ee  (a^))  C^Ai 
ting  Hg  =  ^,  we  first  show  that 


dX.  Let- 


27r 
n 


n-l  „27r 

V  r(A,  /xjl°,ee  (Ms)  -    /        ^(A,  M)^n,e£  (A^)  ^/^ 


dX  =  Op(l). 


The  inner  integral  can  be  split  into  a  part 

"-1  f27r 


"    5=1  -^0 


"-1      /•27r(s+l)/n 
=      X]   /  [(7"(A,AXs)  -  T{X,ll))In,s6  ifJ-s)  +  r{X,IJ.)(In,,£  (/J-s)  -  In,ee  {^J'))]  dfM 

r2iT/n 
+  T(X,n)In,eei^)dlX  (11) 

Jo 

and  o-2(^  ^^Zj^  r(A,  /xj  -  /(f"  t(A,  n)dn).  If  M^  <  A  for  all  s  <  i  then  27r(i  +  1)  >  An  and 
27ri  <  An  implying  that  0  <  A  —  2TTt/n  <  27r/n  such  that 

"-1  r27r 


Vni 


n<T 


27r  /■ 

—  y]^(A,/xJ-/      T{X,fj,)diJ, 


=  0(n-i/2) 
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uniformly  in  A.  Using  Markov's  inequality  we  look  at  each  term  of  (11)  separately.  First 

"■-1     r27r(s+l)/n 


yfnE 


<     Wn 


u 

IV — 1  p 


l2TTs/n 
"-1     /•27r(s+l)/n 


{t^K  Ms)  -  t{X,  lj))In,ee  {f^s)  ^/^ 


sup  |r(A,  Us)  -  t(A,  /x)  I  a'^dfj. 

2ns/n  fJ- 


where  sup^  \T{X,^g)  —  t(A,/x)|  =  0  if  A  ^  [27rs/n,  27r(s  +  l)/n]  and  1  otherwise.  Therefore 

"■-1     /•27r(s+l)/n  27r 

^  727rs/n  A^  V^l 

uniformly  in  A.  Also 


^/nE 


/•27r/n 

/  T{X,^)In,te{lj)d^ 

JO 


<a' 


2tt 
^/n 


where  the  bound  is  again  uniform  in  A.  Finally,  since  E{In,ee  {^J■s)  ~  In,ee  ifj))  =  0, 

"-1      r2n(s+l)/n 


\fnE 


~(  J2-n 


s=l  •'27rs/ra 
"-1     r27r{s+l)/n 


T{X,jl){In,ee  (mJ  "  -^n.ee  (Ai))c^M 


g—l  J2-Ks/n 

where,  from  Brillinger  (1981),  p. 417,  nvar(In,e£  {/J^s)  —  In,ee{fJ'))  =  0{n~^)  uniformly  on 
2iTs/n  <  fi  <  2n{s  +  l)/n.  This  shows  that 


y/nE 


27r 
n 


n-l  j.2n 


=  0{n-'/^) 


uniformly  in  A.  We  can  therefore  consider 

n-l 


J  —IT 


2n 


n 


^^(A,Ms)^°,£e(Ats) 


s=l 


dX 


which  is  bounded  in  probability  by  Markov's  inequality  if  sup;^  nE  ^  Yl^=i  '''(^^  f^s)^n,ee  (Ms) 
is  bounded.  From  Brillinger  (1981),  Theorem  5.10.1, 

2 


nE 


n-l 


—  ^T(A,/xJ/°_^^(AiJ 
s=l 

Jo 

+   /       /      /e..e(/^l,Ai2>-j"l)c?/^l<^M2+C>(n    ^) 

Jo   o/o 
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where  the  error  is  uniform  in  A.  Then  f^^{X)  =  cr'^  and  /^..^  (/ii,/X2,  — /^i)  is  uniformly 
bounded  under  assumption  (A-1). 

From  Brillinger  (1981),  Theorem  5.10.1  and  5.10.2,  it  follows  immediately  that 


VnE 


J  —TI 


is  bounded  such  that  the  second  term  in  (10)  is  small  in  probability  on  Tis.  This  completes 
the  proof.  ■ 
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Table  1:  Relative  efRciency  of  OLS  for  ARCH(l)  innovations 
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Table  2:  Means  and  Medians 
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Figure  1:  Asymptotic  efficiency  of  OLS  relative  to  the  IV  estimator  as  a  function  of  the 
parameter  cj).  Generating  mechanisms  considered  are  from  bottom  to  top:  7^  =  .5,  7^  =  .4, 
7i  =  .3  and  7^  =  .2. 
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Figure  2:  Empirical  density  of  parameter  estimates  for  an  AR(1)  model  with  4>  =  .9  when 
the  errors  have  no  ARCH  effects 
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Figure  3:  Empirical  densities  of  estimated  AR  parameters  when  cj)  =  .9  and  7^  =  .9 
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